Automated Document Labeling Using Integrated Image and Neural Processing
نویسنده
چکیده
As part of our effort to develop an automated data entry system to identify and convert bibliographic information from paper-based documents to electronic format for inclusion in the MEDLINE database used worldwide by biomedical researchers and clinicians, we have implemented a new technique for automatically labeling zones from scanned images with meaningful labels such as title, author, affiliation, and abstract using integrated image and neural processing. Using a commercial 5-engine OCR system, scanned binary document images are first segmented into regular text zones. Each text zone is then processed to deliver an OCR output (including zone coordinates, text line information, characters and their bounding boxes, confidence levels, font sizes, and certain style attributes). From this output, features for each zone are calculated, normalized and then fed into the input layer of a back-propagation neural network for label classification. Experiments carried out on a variety of medical journals show the feasibility of using the neural network approach for label classification. Preliminary evaluation results on a sample size of several thousand images of medical journal pages show that the system is capable of labeling text zones at a classification accuracy of 98%.
منابع مشابه
Identification of selected monogeneans using image processing, artificial neural network and K-nearest neighbor
Abstract Over the last two decades, improvements in developing computational tools made significant contributions to the classification of biological specimens` images to their correspondence species. These days, identification of biological species is much easier for taxonomist and even non-taxonomists due to the development of automated computer techniques and systems. In this study, we d...
متن کاملAutomatic road crack detection and classification using image processing techniques, machine learning and integrated models in urban areas: A novel image binarization technique
The quality of the road pavement has always been one of the major concerns for governments around the world. Cracks in the asphalt are one of the most common road tensions that generally threaten the safety of roads and highways. In recent years, automated inspection methods such as image and video processing have been considered due to the high cost and error of manual metho...
متن کاملDiagnosis of brain tumor using PNN neural networks
Cells grow and then need a very neat method to create new cells that work properly to maintain the health of the body. When the ability to control the growth of the cells is lost, they are unconsidered and often divided without order. Exemplified cells form a tissue mass called the tumor. In fact, brain tumors are abnormal and uncontrolled cell proliferations. Segmentation methods are used in b...
متن کاملAn Automated MR Image Segmentation System Using Multi-layer Perceptron Neural Network
Background: Brain tissue segmentation for delineation of 3D anatomical structures from magnetic resonance (MR) images can be used for neuro-degenerative disorders, characterizing morphological differences between subjects based on volumetric analysis of gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF), but only if the obtained segmentation results are correct. Due to image arti...
متن کاملAutomatic Detection and Localization of Surface Cracks in Continuously Cast Hot Steel Slabs Using Digital Image Analysis Techniques
Quality inspection is an indispensable part of modern industrial manufacturing. Steel as a major industry requires constant surveillance and supervision through its various stages of production. Continuous casting is a critical step in the steel manufacturing process in which molten steel is solidified into a semi-finished product called slab. Once the slab is released from the casting unit, th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999